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Abstract 

We study a long standing conjecture on the necessary and sufficient conditions for the compatibility of multi-state 
characters: There exists a function f(r) such that, for any set C of r-state characters, C is compatible if and only if every 
subset of f(r) characters of C is compatible. We show that for every r > 2, there exists an incompatible set C of Q (r 2 ) 
r-state characters such that every proper subset of C is compatible. This improves the previous lower bound of 
f(r) > r given by Meacham (1 983), and f (4) > 5 given by Habib and To (201 1 ). For the case when r = 3, Lam, Gusfield 
and Sridhar (201 1) recently showed that f(3) = 3. We give an independent proof of this result and completely 
characterize the sets of pairwise compatible 3-state characters by a single forbidden intersection pattern. 
Our lower bound on f(j) is proven via a result on quartet compatibility that may be of independent interest: For every 
n > 4, there exists an incompatible set 0 of Q in 2 ) quartets over n labels such that every proper subset of 0 is 
compatible. We show that such a set of quartets can have size at most 3 when n = 5, and at most 0(n 3 ) for arbitrary 
n. We contrast our results on quartets with the case of rooted triplets: For every n > 3, if R is an incompatible set of 
more than n — 1 triplets over n labels, then some proper subset of R is incompatible. We show this bound is tight by 
exhibiting, for every n > 3, a set of n — 1 triplets over n taxa such that R is incompatible, but every proper subset of/? is 
compatible. 
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Background 

The multi-state character compatibility (or perfect phy- 
logeny) problem is a basic question in computational 
phylogenetics [1]. Given a set C of characters, we are 
asked whether there exists a phylogenetic tree that dis- 
plays every character in C; if so, C is said to be compatible, 
and incompatible otherwise. The problem is known to be 
NP-complete [2,3], but certain special cases are known to 
be polynomially-solvable [4-10]. See [11] for more on the 
perfect phylogeny problem. 

In this paper we study a long standing conjecture on the 
necessary and sufficient conditions for the compatibility 
of multi-state characters. 

Conjecture 1. There exists a function fir) such that, for 
any set C of r -state characters, C is compatible if and only 
if every subset of fir) characters ofC is compatible. 

If Conjecture 1 is true, it would follow that we can deter- 
mine if any set C of r-state characters is compatible by 
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testing the compatibility of each subset of/(r) characters 
of C, and, in case of incompatibility, output a subset of at 
most/(r) characters of C that is incompatible. This would 
allow us to reduce the character removal problem (i.e., 
finding a subset of characters to remove from C so that 
the remaining characters are compatible) to fir) -hitting 
set which is fixed-parameter tractable [12]. 

A classic result on binary character compatibility shows 
that/(2) = 2; see [1,6,13-15]. In 1975, Fitch [16,17] gave 
an example of a set C of three 3-state characters such that 
C is incompatible, but every pair of characters in C is com- 
patible; showing that f(3) > 3. In 1983, Meacham [15] 
generalized this example to r-state characters for every 
r > 3 demonstrating a lower bound of/(r) > r for all r; see 
also [9]. For the case of r = 3, Lam, Gusfield, and Sridhar 
[9] recently established that/(3) = 3. 

While the previous results could lead one to conjecture 
that/(r) = r for all r, Habib and To [18] recently dis- 
proved this possibility by exhibiting a set C of five 4-state 
characters such that C is incompatible, but every proper 
subset of the characters in C are compatible, showing that 
/(4) > 5. They conjectured that fir) > r + 1 for every 
r > 4. 
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The main result of this paper is to prove the conjec- 
ture stated in [18] by giving a quadratic lower bound on 
f{r). Formally, we show that for every r > 2, there exists 
a set C of r-state characters such that all of the following 
conditions hold. 

1. C is incompatible. 

2. Every proper subset of C is compatible. 

3 . | C | = L§j-m+i. 

Therefore,/(r) > |_|J • |~|] +1 for every r > 2. 

Our proof relies on a new result on quartet compatibility 
we believe is of independent interest. We show that for 
every n > 4, there exists a set Q of quartets over a set of n 
labels such that all of the following conditions hold. 

1. Q is incompatible. 

2. Every proper subset of Q is compatible. 

3 . | Q | = L ^j.p^l + l. 

This is an improvement over the previous lower bound on 
the maximum cardinality of such an incompatible set of 
quartets of n — 2 given in [3] . We show that such a set of 
quartets can have size at most 3 when n = 5, and at most 
0(n 3 ) for arbitrary n. We note here that the construc- 
tion given in [18] showing that /(4) > 5 can be viewed 
as a special case of the construction given here when 
n = 6. 

We study the compatibility of three-state characters fur- 
ther. The work of [9] completely characterized the sets of 
pairwise compatible 3-state characters by the existence of 
one of four forbidden intersection patterns. An alternative 
characterization of this result was given in [10] and was 
partially derived using the results of [9] . In this paper, we 
give a proof that/(3) = 3 that is independent of the results 
in [9], and we completely characterize the sets of pair- 
wise compatible 3-state characters by a single forbidden 
intersection pattern. 

We contrast our result on quartet compatibility with a 
result on the compatibility of rooted triplets: For every 
n > 3, if R is an incompatible set of triplets over n labels, 
and \R\ > n — 1, then some proper subset of R is incompat- 
ible. We show this bound is tight by exhibiting, for every 
n > 3, a set of n — 1 triplets over n labels such that R is 
incompatible, but every proper subset of R is compatible. 

Preliminaries 

Given a graph G, we represent the vertices and edges of 
G by V (G) and E(G) respectively. We use the abbreviated 
notation uv for an edge {u, v] e E(G). For any e e E(G), 
G — e represents the graph obtained from G by deleting 
edge e. For an integer /, we use [/] to represent the set 
{1,2,... ,/}. 



Unrooted phylogenetic trees 

An unrooted phylogenetic tree (or just tree) is a tree T 
whose leaves are in one to one correspondence with a label 
set L(T), and has no vertex of degree two. See Figure 1(a) 
for an example. For a collection T of trees, the label set of 
T, denoted L(T), is the union of the label sets of the trees 
in T. A tree is binary if every internal (non-leaf) vertex 
has degree three. A quartet is a binary tree with exactly 
four leaves. A quartet with label set {a, b, c, d} is denoted 
ab\cd if the path between the leaves labeled a and b does 
not intersect with the path between the leaves labeled c 
and d. 

For a tree T, and a label set L c L(T), the restriction 
of T to L } denoted by T\L, is the tree obtained from the 
minimal subtree of T connecting all the leaves with labels 
in L by suppressing vertices of degree two. See Figure 1(b) 
for an example. A tree T displays another tree T f , if V can 
be obtained from T\L(T f ) by contracting edges. A tree T 
displays a collection of trees T if T displays every tree in 
T. If such a tree T exists, then we say that T is compatible; 
otherwise, we say that T is incompatible. See Figure 1(a) 
for an example. Determining if a collection of unrooted 
trees is compatible is NP-complete [3]. 

Multi-state characters 

There is also a notion of compatibility for sets of partitions 
of a label set L. A character x on L is a partition of L; the 
parts of x are called states. If x nas at most r parts, then 
X is an r-state character. Given a tree T with L = L(T) 
and a state 5 of x> we denote by T s (x) the minimal subtree 
of T connecting all leaves with labels having state s for x • 
We say that x is convex on T, or equivalently T displays x> 
if the subtrees T/(x) and Tj(x) are vertex disjoint for all 
states i and j of x where i 7^ /. A collection C of charac- 
ters is compatible if there exists a tree T on which every 
character in C is convex. If no such tree exists, then we 
say that C is incompatible. See Figure 1(a) for an example. 



a b 




Figure 1 A phylogenetic tree and a restricted subtree, (a) shows a 

tree T witnessing that the quartets = ab\ce, q 2 = cd\bf, and 
q 3 = ad\ef are compatible; T is also a witness that the characters 
X q , =ab\ce\d\f,x q2 = cd\bf\a\e, and x q3 =ad\ef\b\cz\e 
compatible; (b) shows T\{a, b,c,d,e}. 
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The perfect phytogeny problem (or character compatibility 
problem) is to determine whether a given set of characters 
is compatible. 

For a collection C of characters, the intersection graph 
of C which we will denote by G(C), is the undirected 
graph G = (V,E) which has a vertex c; for each charac- 
ter c £ C and each state i of c, and an edge C{dj precisely 
when there is a taxon having state i for character c and 
state j for character Note that G(C) cannot have an 
edge between vertices associated with different states of 
the same character. 

A graph G is chordal if there are no induced chordless 
cycles of length four or greater in H. In [19], Buneman 
established a fundamental connection between the per- 
fect phylogeny problem and chordal graphs which we now 
describe. For a given set C of characters, suppose we color 
each of the vertices of G(C) by assigning a unique color to 
each character ceC, and giving each vertex of G(C) cor- 
responding to a state of c with the color assigned to the 
character c. A proper triangulation of G(C) is a chordal 
supergraph of G(C) such that every edge has endpoints 
with different colors. 

Theorem 1. A set C of characters is compatible if and 
only ifG(C) has a proper triangulation. 

Since there is no proper triangulation for a cycle in G(C) 
involving only vertices from two characters, we have the 
following corollary. 

Corollary 1. Let C be a collection of two characters. 
Then C is compatible if and only ifG(C) is acyclic. 

Quartet rules 

We now introduce quartet (closure) rules which were orig- 
inally used in the contexts of psychology [20] and linguis- 
tics [21]. The idea is that for a collection Q of quartets, any 
tree that displays Q may also necessarily display another 
quartet q £ Q, and if so we write Q h q. 

Example L Let Q = {ab\ce, ae\cd}. Then the tree of 
Figure 1(b) displays Q, and furthermore, it is easy to see 
that it is the only tree that displays Q. Hence, Q h ab\de, 
Q h ab\cd, and Q h be\cd. 

We use the following quartet rules in this paper: 
{ab\cd,ab\ce} h ab\de (Rl) 

{ab I cd, ac | de} h ab\ce (R2) 
For the purposes of this paper, we define the closure of 
an arbitrary collection Q of quartets, denoted Q*, as the 
minimal set of quartets that contains Q, and has the prop- 
erty that if for some q\, q<i e Q*, {q\, q<i\ h #3 using either 



(Rl) or (R2), then q 3 e Q*. Clearly, any tree that dis- 
plays Q must also display Q*. We will use the following 
lemma which follows by repeated application of (Rl) and 
is formally proven in [22] . 

Lemma 1. Let Q be an arbitrary set of quartets with 
[x,y,z 1 ,...,z k }QL(Q).If 

k-i 

\J{xy\ZiZi+i] £ Q* , 

i=l 

then xy\z\Zk e Q*. 

We refer the reader to [1,23] for more on quartet rules. 
Incompatible quartets 

For every s, t > 2, we fix a set of labels L S)t = 
{a\, a2, . . . , a s , b\, b^, . . . , b t ] and define the set 

s-l t-i 

Qs,t = {aibi\a s b t } U (J [J{aia i+ i\bjbj + i} 

i=l ;'=1 

of quartets with L(Q Sft ) = L Sft . We denote the quartet 
a\bi \a s b t by qo, and a quartet of the form aia,i+i\bjbj+\ by 

Observation 1. For all s,t>%\ Q s ,t I = (s - 1) (t - 1) + 1. 

Lemma 2. For all s,t >2, Q Sft is incompatible. 

Proof. For each i e[s — 1], 

t-i 

yj{aia i+ i\bjb j+ i} C Q st c Q* f . 
;=i 

Then, by Lemma 1, it follows that for each i e[s — 1], 

aiai+i\b\b t e Q* tt . So, 

5-1 

U{M;k^+i} c Q* t . 

Then, again by Lemma 1, it follows that e Q* t . 

But then {aibi\a s b t , bib t \aia s ] c Q* f . It follows that any 
tree that displays must display both and 
b\b t \a\a s . However, no such tree exists. Hence, Q Sft is 
incompatible. □ 

Lemma 3. For all s,t > 2, every proper subset ofQ S) t is 
compatible. 

Proof. Since every subset of a compatible set of quartets 
is compatible, it suffices to show that for every q e Q s j, 
Qs,t \ {#} is compatible. Let q e Q 5 ^. Either q = qo or 
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q = q for some 1 < x < s and 1 < y < t. In either case, 
we exhibit a tree witnessing that Q S)t \ {q} is compatible. 

Case 1. Suppose q = qo. We build the tree T as 
follows: There is a node i for each label i e L S)t and 
two additional nodes a and b along with the edge ab. 
There is an edge a x a for every a x e L S)t , and an edge 
b x b for every b x e L Sft . There are no other nodes or 
edges in T. See Figure 2(a) for an illustration. Now 
consider any quartet q e Q S) t \ {qo}- Then 
q — didi+i \bjbj+\ for some 1 < i < s and 1 < j < t. 
Then, the minimal subgraph of T connecting leaves 
with labels in {ai, ai+i> bj, bj+\) is the quartet q. 
Hence T displays q. 

Case 2. Suppose q = q x>y for some 1 < x < s and 
1 < y < t. We build the tree T as follows: There is a 
node i for each label i e L S)t and six additional nodes 
at, i, h, and b^. There are edges ati, bit, ih, 
hafo and hb^. For every ai e L s>t , there is an edge a^ai 
if i < x, and an edge ata^ if i > x. For every bj e L S)t 
there is an edge bjbi if j < x, and an edge bjbh if j > y. 
There are no other nodes or edges in T. See 
Figure 2(b). Now consider any quartet 
q e Q S)t \ {q x ,y}- Either q = q 0 or q = qy where i ^ x 
or j ^ y.If q = qo, then the minimal subgraph of T 
connecting leaves with labels in {a\, b\, a s , b t } is the 
subtree of T induced by the nodes in 
[ai, I, bt, b\, a s , a h , h, b h , b t }. Suppressing all 
degree two vertices results in a tree that is the same 
as qo. So T displays q. So assume that 
q = aiCLi+\ \bjbj-\-i where i 7^ x or/ 7^ y. We define 
the following subset of the nodes in T: 



labels in q. Suppressing all degree two vertices gives 
q. Hence, T displays q. 



V-- 



{au ai+i, at, I, bt, bj, bj+i) 
[at, cii+i, at, i t by, bt, h, b^, b y +\} 
{ai, di+i, at, it h, b^ bj, bj+\} 
[a x , at, i t h, akt a x+ \, bt, bj, bj+i) 
[a x , at, i, h, a^, a x+ \, b^, bj, bj+i] 
[aj, aj + \,ah, h, i, bt, bj, bj+{\ 
[aj, aj+\,ah, h, b y , bt, it b^, b y +\ } 
{aj, dj+itdh, h, b^ bj, bj+\} 



Hi < x and / < y, 
if i < x and ; = y, 
if i < x and / > y, 
if i = x and j < y, 
if i = x and j > y, 
if i > x and ; < y, 
if i > x and / = y, 
ifi>x and / > y. 



Now, the subgraph of T induced by the nodes in V is 
the minimal subgraph of T connecting leaves with 





Figure 2 Illustrating the proof of Lemma 3. (a) Case 1 : a tree that 
displays Q S/f \ {qo}. (b) Case 2: a tree that displays Q Sit \ {q x , y }- 



□ 



With s = |_§ J and t = [|], Observation land Lem- 
mas 2 and 3 imply the following theorem. 

Theorem 2. For every integer n > 4, there exists a set 
Q of quartets over n taxa such that all of the following 
conditions hold. 

1. Qis in com pa tible. 

2. Every proper subset of Q is compatible. 

3 . |Q| =L -2j. m + l. 

Incompatible quartets on five taxa 

When Q is a set of quartets over five taxa, we show that the 
set of quartets given by Theorem 2 is as large as possible. 
We hope that the technique used in the proof of the fol- 
lowing theorem might be useful in proving tight bounds 

for n > 5. 

Theorem 3.IfQ is an incompatible set of quartets over 
five taxa such that every proper subset ofQ is compatible, 
then \Q\<3. 

Proof Let Q be an incompatible set of quartets with 
L(Q) = {a,b,c,d,e} and qo = ab\cd e Q. We will show 
that Q contains an incompatible subset of at most three 
quartets. If Q contains two different quartets on the same 
four taxa, then Q must contain an incompatible pair of 
quartets. So, we may assume that each quartet is on a 
unique subset of four of the five taxa. Hence, every pair of 
quartets in Q shares three taxa in common. We have the 
following two cases. 

Case 1 : Q contains at least one of the quartets ac\be, 

ac\de, ad\be, ad\ce, ae\bc, ae\bd, bc\de, or bd\ce. 

W.l.o.g. we may assume that Q contains q\ = ac\de, 

as all other cases are symmetric. By (R2), 

{qo>qi} l~ ab\ce. Then, by (Rl), 

{qo, qi, ab\ce] h ab\de. Then, again by (Rl), 

{qot qi, ab\ce, ab\de] h bc\de. Now let 

Q f = {qo, qi, ab\ce, ab\de, bc\de}. Now, any quartet in 

Q must be either in Q f or be pairwise incompatible 

with a quartet in Q f . Since Q f is compatible, but by 

assumption, Q is incompatible, Q must contain a 

quartet #2 that is pairwise incompatible with some 

quartet in Q f . Hence, {qo, #i> #2} is an incompatible 

subset of Q. 

Case 2: Q contains none of the quartets ac\be, ac\de, 
ad\be, ad\ce, ae\bc, ae\bd, bc\de, or bd\ce. Then every 
quartet in Q is either of the form ab\xy where 
{x,y} ^ {c, d}, or cd\xy where {x,y} 7^ {a, b}. But then 
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Q is compatible, contradicting our assumption that 
Q is incompatible. 

In either case, the theorem holds. □ 

Incompatible quartets on arbitrarily many taxa 

We say a set Q of compatible quartets is redundant if for 
some q e Q, Q \ {q} h q; otherwise, we say that Q is irre- 
dundant. The following lemma establishes a connection 
between sets of irredundant quartets and minimal sets of 
incompatible quartets. 

Lemma 4. If Q is incompatible, but every proper sub- 
set of Q is compatible, then every proper subset of Q is 
irredundant 

Proof Suppose that Q is incompatible and every proper 
subset of Q is compatible. Furthermore, suppose that 
some proper subset Q f of Q is redundant. Since every 
compatible superset of a redundant set of quartets is 
also redundant, we may assume w.l.o.g., that there is 
a unique quartet q e Q \ Q f (i.e., \Q\ = |Q'| + 1). 
Since Q f is redundant, there exists a q f e Q f such that 
Q f \ W) I- c(. But then (Q f \ {q'}) U {q} is incom- 
patible, contradicting that every proper subset of Q is 
compatible. □ 

It follows from Lemma 4 that any upper bound on the 
maximum cardinality of an irredundant set of quartets can 
be used to place an upper bound on the maximum cardi- 
nality of a set of quartets satisfying the first two conditions 
of Theorem 2. The theorem follows from [22]. 

Theorem 4. let Q be a set of quartets over a set of n 
taxa. If Q is irredundant, then Q has cardinality at most 
(n-3)(n-2) 2 /3. 

Lemma 4 together with Theorem 4 gives the follow- 
ing upper bound on the maximum cardinality of a set Q 
of quartets over n > 5 taxa that satisfies the first two 
conditions of Theorem 2. 

Theorem 5. let Q be a set of incompatible quartets 
over a set of n taxa such that every proper subset of Q is 
compatible. Then |Q| < in - 3)(n - 2) 2 /3 + 1. 

Incompatible characters 

There is a natural correspondence between quartet 
compatibility and character compatibility that we now 
describe. Let Q be a set of quartets, n = |X(Q)|, and 
r = n — 2. For each q = ab\cd e Q, we define 
the r-state character corresponding to q, denoted Xq> 
as the character where a and b have state 0 for Xq> c 
and d have state 1 for Xq> an d> f° r eacn ^ £ ^(Q) \ 



{a, b, c, d}, there is a state s of Xq such that i is the only 
label with state s for character Xq ( see Example 2). We 
define the set of r-state characters corresponding to Q by 

C Q = UqeQiXq}- 

Example 2. Consider the quartets and characters given 
in Figure 1(a): Xqi is the character corresponding to q\, 
Xq 2 is the character corresponding to q2, and is the 
character corresponding to q%. 

The following lemma relating quartet compatibility to 
character compatibility is well known [24], and its proof is 
omitted here. 

Lemma 5. A set Q of quartets is compatible if and only 
ifCQ is compatible. 

The next theorem allows us to use our result on quartet 
compatibility to establish a lower bound on/(r). 

Theorem 6. let Qbea set of incompatible quartets over 
n labels such that every proper subset of Q is compatible, 
and let r = n — 2. Then, there exists asetCof\Q\ r-state 
characters such that C is incompatible, but every proper 
subset ofC is compatible. 

Proof. We claim that Cq is such a set of incompatible r- 
state characters. Since for two quartets qi,q2 £ Q> Xqi 7^ 
Xq 2 > it follows that \Cq\ = \Q\. Since Q is incompatible, 
it follows by Lemma 5 that Cq is incompatible. Let C be 
any proper subset of C. Then, there is a proper subset Q f 
of Q such that C = Cq. Then, since Q f is compatible, it 
follows by Lemma 5 that C is compatible. □ 

Theorem 2 together with Theorem 6 gives the main 
theorem of this paper. 

Theorem 7. For every integer r > 2, there exists a set C 
of r-state characters such that all of the following hold. 

1. C is in com pa tible. 

2. Every proper subset of C is compatible. 

3. ici = l§j • r$i + 1. 

Proof. By Theorem 2 and Observation 1, there exists a 
set Q of |_|J • |~!~| + 1 quartets over r + 2 labels that 
that are incompatible, but every proper subset is com- 
patible, namely Q|^r+2j |~r±2~|- The theorem follows from 
Theorem 6. □ 

The quadratic lower bound on fir) follows from 
Theorem 7. 

Corollary 2. fir) > [ r ^\ • [§]+!. 
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Three-State Characters 

In the remainder of this section we focus on the case 
when r = 3, and thus, fix C to be an arbitrary set of 
3-state characters over a set S of taxa. Lam, Gusfield, 
and Sridhar [9] recently established that f(3) = 3, and 
they completely characterized the sets of pairwise com- 
patible 3-state characters by the existence of one of four 
forbidden intersection patterns. We give an independent 
proof that f(3) = 3. We then completely character- 
ize the sets of pairwise compatible 3-state characters by 
a single forbidden intersection pattern. Our proof uses 
several structural results from the algorithm for the three- 
state perfect phylogeny problem given by Kannan and 
Warnow [7]. 

The Algorithm of Kannan and Warnow 

The algorithm of [7] takes a divide and conquer approach 
to determining the compatibility of a set of three-state 
characters. An instance is reduced to subproblems by 
finding a partition Si, S2 of the taxon set S of C with both 
of the following properties: 

1. 2 < \S(\ <n-%i= 1,2. 

2. Whenever C is compatible S there is a perfect 
phylogeny P that contains an edge e whose removal 
breaks P into subtrees Pi and P2 with 
L(Pi)=Si,i = 1,2. 

A partition of S satisfying both of these properties is a 
legal partition, and the following theorem shows that find- 
ing such a partition for a given set of characters is the crux 
of the algorithm. 

Theorem 8. [7] Given a set C of three state characters, 
we can in 0(nk) time either find a legal partition of S of 
determine that the set of characters is incompatible. 

Finding a legal partition 

We now discuss the manner in which such a legal parti- 
tion is found for a set of three-state characters C. Let T 
be a tree witnessing that C is compatible. The canonical 
labeling of T is the labeling where, for each internal node 
v of T, and each character a e C, if there are leaves x and y 
in different components of T — {v} such that a(x) = a(y), 
then a(y) = a(x); otherwise a(y) = * where * denotes a 
dummy state for C. Note that such a labeling of T always 
exists and is unique. We will assume that every compatible 
tree for C is canonically labeled. 

The tree-structure for a character a in T is formed by 
repeatedly contracting edges of T connecting nodes that 
have the same state (other than *) for a. Note that this tree 
does not depend on the sequence of edge-contractions 
and is thus well defined. Furthermore, there is exactly 
one node for each state (other than the dummy state) of 



a, and each node labeled by * has degree at least three. 
A tree-structure for a that is formed from some com- 
patible tree for C is called a realizable tree-structure for 
a. There are four possible realizable tree-structures for a 
three-state character a which are shown in Figure 3. 

To find a realizable tree structure for a character a, 
the algorithm examines the pairwise intersection patterns 
of a with every other character /3 e C, and applies the 
following rules to rule out possible tree structures for a. 

Rule 1. Let a and be two characters ofC. If, under some 
relabeling of the states of a and fi, we have that a\ c p h 
ot2 H P2 7^ ft and «3 fl P2 7^ ft then P 1 is not a realizable 
tree-structure for a. If this is the case, we say that a and 
match Rule 1 with respect to a±. 

Rule 2. a Let a and /3 be two characters of C. If, under 
some relabeling of the states of a and /3, we have that ot\ D 
Pi ^ 0, a 2 n Pi ^ 0, a 2 n P2 7^ ft and a 3 n fi 2 ^ ft then 
P 2 is the only possible realizable tree-structure for a. If this 
is the case, we say that a and match Rule 2 with respect 

tO 0(2- 

The set of candidate tree-structures for a are 
all of those possible tree-structures for a that are not 
ruled out after comparing the intersection pattern of a 
with every other character in C and applying Rules 1 
and 2. 

The following theorem which follows from [7] shows 
that a legal partition is found by choosing an arbitrary 
a e C for which ^ 0. Furthermore, if there is an a e C 
for which = 0, then C is incompatible. 

Theorem 9 ([7]). If ^ ft then we can find a legal 
partition ofS. 

Corollary 3. A set C of 3-state characters is compatible 
if and only ifQ^^0 far every a e C. 

Tight bounds on three-state character compatibility 

We use Corollary 3 to give upper bounds on the maximum 
cardinality of a minimal set of incompatible three-state 
characters. 



a b ai 

I 

* 

atj — ai—ak a 2 ^ ""a 3 

Figure 3 The four possible realizable tree-structures for a 
three-state character a. (a) A path P' for each /' e { 1 , 2, 3}. (b) A star S* . 
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Theorem 10. Let C be a set of three-state characters on 
species set S. Then C is incompatible if and only if there 
exists a character a e C, and two distinct states oti and aj 
of a, such that both of the following hold: 

1. There is a p e C where the intersection pattern of a 
and p matches Rule 2 with respect to a,{. 

2. There is ay e C where the intersection pattern of a 
andy matches Rule 2 with respect to aj. 

Proof (=^) If C is pairwise incompatible, then by 
Corollary 1, there is a pair a,p e C whose intersec- 
tion graph contains a cycle. Since the intersection graph 
is bipartite, this cycle must have length at least four 
and contain at least two states of each character. Let ai 
and oLj be the two states of a on this cycle. Then, the 
intersection pattern of a and P matches Rule 2 with 
respect to both oti and aj, and so the theorem holds. 
So we may assume that C is incompatible but pairwise 
compatible. 

It follows from Corollary 3 that there exists an a e C 
such that = 0. Then there must exist a character e 
C such that the intersection pattern of a and p matches 
Rule 2 with respect to some state oti of a; otherwise S* e 
Q£. Hence, Q£ c {p}. Then, since Q£ = 0, there must 
be a character y e C such that the intersection pattern of 
a and y places a constraint on that prevents from 
containing P*. There are two possibilities. 

Case 1: There is a state otj of a where / 7^ i and 
the intersection pattern of a and y matches Rule 2 with 
respect to aj. In this case the theorem holds. 

Case 2: The intersection pattern of a and y matches 
Rule 1 with respect to a{. W.l.o.g., we fix i = 1, and relabel 
the states of a, ft and y so that ai fl ft ^ 0, a\ Pi ft 7^ 0> 
a 2 P ft 7^ 0, a 3 H ft 7^ 0, «i c y lf a 2 P y 2 # 0, and 
of 3 n y 2 7^ 0- Such a labeling exists since, by assumption, 
a and P matches Rule 2 with respect to a±, and a and y 
matches Rule 1 with respect to a\. 

If a 2 fl yi 7^ 0, then the intersection pattern of a and 
y matches Rule 2 with respect to a 2i in which case the 
theorem holds. If atf\y\ 7^ 0, then the intersection pattern 
of a and y matches Rule 2 with respect to (#3, in which 
case the theorem holds. So we may assume hat a\ = y±. 
Now, since a\ Hi ft 7^ 0, a?i Pi ft 7^ 0, and a\ = y\, we have 
that both ft H yi 7^ 0 and ft n y 2 7^ 0. 

y3 must have a nonempty intersection with at least one 
state of a, and since a\ = yi, we have that a?i Pi y3 = 0. So 
y3 has a nonempty intersection with either a 2 or a<$. Due 
to the symmetry of the intersection graph of a and ft we 
may assume, w.l.o.g., that a 3 Hi y3 7^ 0. 

By assumption, a 2 H yi = 0, and if a 2 Pi y3 7^ 0, then 
the intersection graph of a and contains a cycle, contra- 
dicting our assumption that C is pairwise compatible. So 



we may assume that a 2 C y 2 . Then, since ft Pi a 2 7^ 0, we 
have that ft n y 2 7^ 0. 

Let s e as Pi ft. Since, by assumption, 0^3 fl yi = 0, we 
have that either s e y 2 or s e y$. However, if s e y 2 , then 
ft n y 2 7^ 0 and intersection graph of and y contains 
a cycle, contradicting our assumption that C is pairwise 
compatible. Hence 5 g 5/3 and ft fl y3 7^ 0. 

We have now established all of the edges of the inter- 
section graph of a, ft and y represented by the solid 
edges in Figure 4. Now, let 55 G a$ Pi y 2 . Now 55 must 
be in some state of ft If 55 e ft, then 55 e ft P (#3 
and the intersection graph of /3 and a contains a cycle, 
contradicting our assumption that C is pairwise compat- 
ible. If 55 G ft, then 55 G ft fl y 2 , and the intersection 
graph of ft and y contains a cycle, again contradicting our 
assumption that C is pairwise compatible. Hence 55 G ft. 
Then, we have that 55 g fe fl (#3 and 55 g ft Pi y 2 , 
witnessing the dotted edges in Figure 4. So we have that 
the intersection pattern of /3 and a matches Rule 2 with 
ft as witness, and the intersection pattern of ft and y 
matches Rule 2 with Pi as witness. Hence the theorem 
holds. □ 

Note that in the statement of Theorem 10, the charac- 
ters P and y are not necessarily distinct. In cases where 
they are not distinct, C contains an incompatible pair. 

Corollary 4. A set C of3-state characters is compatible 
if and only if every subset of at most three characters ofC is 
compatible. 

In [9], it was also shown that we can determine the 
compatibility of a pairwise compatible set C of three-state 
characters by testing the intersection patterns of C for the 
existence of one of a set of four forbidden patterns. As a 
corollary to Theorem 10, we have that a single forbidden 
pattern suffices to determine the compatibility of C. 

Corollary 5. A pairwise compatible set C of 3-state 
characters is compatible if and only if the partition inter- 
section graph of C does not contain, up to relabeling of 
characters and states, the subgraph of Figure 5. 



Pi *n— — /• • 7i /3i «s— S 7i 

02 <^ —7* 72 #2 <^ ^» 72 

Ai • • 73 /3 3 73 

Figure 4 Illustrating the proof of Theorem 10. 
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a 



d 



f 



a 



c e 



T\{a,b,c, e} 



Figure 6 Example of rooted phylogenetic trees, (a) shows a tree T 
that is a witness that the triplets ab\c, de\b, ef\c, and ec\b are 
compatible; (b) shows the tree T restricted to the label set {a, b, c, e} . 



Note that each edge of the graph of Figure 5 has one 
endpoint which is a state in a. It follows that we can find 
such a subgraph in the partition intersection graph of C by 
testing the intersection pattern of each pair of characters 
in C [10]. Furthermore, all p occurrences of the forbidden 
subgraph in the intersection graph of m characters on n 
taxa can be found in 0(m 2 n + p) time. Whereas the for- 
bidden subgraph given here is witnessed by eight taxa (or 
edges), each of the four forbidden subgraphs of [9] are wit- 
nessed by five taxa, making them better suited for taxon 
removal problems. 

Incompatible Triplets 

A rooted phylogenetic tree (or just rooted tree) is a tree 
whose leaves are in one to one correspondence with a label 
set L(T), has a distinguished vertex called the root, and no 
vertex other than the root has degree two. See Figure 6(a) 
for an example. A rooted tree is binary if the root ver- 
tex has degree two, and every other internal (non-leaf) 
vertex has degree three. A triplet is a rooted binary tree 
with exactly three leaves. A triplet with label set {a, b, c) is 
denoted ab\c if the path between the leaves labeled a and 
b avoids the path between the leaf labeled c and the root 
vertex. For a tree T, and a label set L c L(T), let V be 
the minimal subtree of T connecting all the leaves with 
labels in L. The restriction of T to L, denoted by T\L, is the 
rooted tree obtained from T' by distinguishing the vertex 
closest to the root of T as the root of T and suppress- 
ing every vertex other than the root having degree two. 
A rooted tree T displays another rooted tree T' if T' can 
be obtained from T\L(T f ) by contracting edges. A rooted 
tree T displays a collection of rooted trees TUT displays 
every tree in T. If such a tree T exists, then we say that 
T is compatible; otherwise, we say that T is incompatible. 
Given a collection of rooted trees T, it can be determined 
in polynomial time if T is compatible [3,25]. 

The following theorems follow from the connection 
between collections of unrooted trees with at least one 



62 a2 61 




c2 a3 cl 

Figure 5 The forbidden subgraph for 3-state character 
compatibility. 



common label across all the trees, and collections of 
rooted trees [3]. 

Theorem 11. Let Q be a collection of quartets where 
every quartet in Q shares a common label L Let R be the 
set of triplets such that there exists a triplet ab\c in R if 
and only if there exists a quartet ab\ci in Q. Then, Q is 
compatible if and only ifR is compatible. 

Let R be a collection of triplets. For a subset S c L(R), 
we define the graph [ R, S] as the graph having a vertex for 
each label in S, and an edge {a, b} if and only if ab\c G 
R for some c e S. The following theorem is from page 
439 of [26]. 

Theorem 12. A collection R of rooted triplets is compat- 
ible if and only if[R, S] is not connected for every S c L(R) 
with \S\ > 3. 

Corollary 6. Let R be a set of rooted triplets such that R 
is incompatible but every proper subset ofR is compatible. 
Then, [R,L(R)] is connected. 

We now contrast our result on quartet compatibility 
with a result on triplets. 

Theorem 13. For every n > 3, ifR is an incompatible set 
of triplets over n labels, and \R\ > n — 1, then some proper 
subset ofR is incompatible. 

Proof. For sake of contradiction, let R be a set of triplets 
such that R is incompatible, every proper subset of R is 
compatible, \L(R)\ = n, and \R\ > n — 1. The graph 
[ R, L(R)] will contain n vertices and at least n edges. Since 
each triplet in R is distinct, there will be a cycle C of 
length at least three in [R,L(R)]. Since R is incompatible 
but every proper subset of R is compatible, by Corollary 6, 
[R,L(R)] is connected. 
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Consider any edge e in the cycle C. Let t be the triplet 
that contributed edge e in [R,L(R)]. Let R' = R \ t. 
Since the graph [R f L(R)]-e is connected, [R f ,L(R f )] 
is connected. By Theorem 12, i?' is incompatible. But 
R f C contradicting that every proper subset of R is 
compatible. □ 

To show the bound is tight, we first prove a more 
restricted form of Theorem 2. 

Theorem 14. For every n > 4, there exists a set of quar- 
tets Q with \L(Q) I = n, and a label i e L(Q), such that all 
of the following hold. 

1. Every q e Q contains a leaf labeled by I. 

2. Q is incompatible. 

3. Every proper subset of Q is compatible. 

4. |Q| =n-2. 

Proof. Consider the set of quartets Q% n -2> From 
Lemmas 2 and 3, Q% n -i is incompatible but every proper 
subset of Qxn-i is compatible. The set Ch,«-2 contains 
exactly n — 2 quartets. From the construction, there are 
two labels in L which are present in all the quartets in 
Q%n-2> Set one of them to be L □ 

The following is a consequence of Theorems 14 and 11. 

Corollary 7. For every n > 3, there exists a set R of 
triplets with \L(R)\ = nsuch that all of the following hold. 

1. R is in com pa tible. 

2. Every proper subset of R is compatible. 

3. \R\=n- 1. 

The generalization of the Fitch-Meacham examples 
given in [9] can also be expressed in terms of triplets. For 
any r > 2, let L = {a, b\, b^, • • • , b r }. Let 

r-l 

R r = ab r \b\ U \^abi\bi+\ 

i=l 

Let Q = {ab\ci : ab\c e R r ] for some label I £ L. The set 
Cq of r-state characters corresponding to the quartet set 
Q is exactly the set of characters built for r in [9] . In the 
partition intersection graph of Cq, (following the termi- 
nology in [9]) labels i and a correspond to the end cliques 
and the rest of the r labels {b\, b^, • • • , b r ] correspond to 
the r tower cliques. From Lemma 5 and Theorem 11, R r is 
compatible if and only of Q is compatible. 

Conclusion 

We have shown that for every r > 2,f(r) > [|J • |"|] +1, 
by showing that for every n > 4, there exists an incompat- 
ible set Q of [ J • [ ] + 1 quartets over a set of n labels 



such that every proper subset of Q is compatible. Previous 
results [1,6,9,13-15], along with our discussion in Section 
Incompatible Characters, show that our lower bound on 
f(r) is tight for r = 2 and r = 3. For quartets, our dis- 
cussion in Section Incompatible quartets gives an upper 
bound on the maximum cardinality of a minimal set of 
incompatible quartets. However, this argument does not 
extend to multi-state characters. Indeed, an upper bound 
on the maximum cardinality of a minimal set of incom- 
patible r-state characters remains a central open question. 
We give the following conjecture. 

Conjecture 2. fir) e B(r 2 ). 

A less ambituous goal would be to narrow the gap 
between the upper bound of 0(n 3 ) and lower bound of 
Q (n 2 ) on the maximum cardinality of a minimal incom- 
patible set of quartets over n taxa given in Section Incom- 
patible Quartets. Note that, due to Theorem 6, a proof of 
Conjecture 2 would also show that the number of incom- 
patible quartets given in the statement of Theorem 2 is 
also as large as possible. 

Endnote 

a Rule 2 was state incorrectly in [7]. 
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